<!DOCTYPE html>
<html lang="en"><head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1">

  <title>Caffe 源码阅读之 innerproduct 层解析</title>
  <meta name="description" content="Linghan Cheung lhcheung1991@gmail.com (请在文章底部留下宝贵的评论和建议, Gitment会将留言以issue形式完整保存方便后续查阅) 摘要   深度学习在计算机视觉领域大放异彩, 许多在传统方法下无法解决的问题正在被一一攻克. 在整个学术社区共同探索深度学习这块新大陆的潜...">

  <link rel="stylesheet" href="/assets/main.css">
  <link rel="canonical" href="/blogs/2018/04/16/caffe-source-reading-innerprodoct.html">
  <link rel="alternate" type="application/rss+xml" title="Linghan Cheung&#39;s Site" href="/feed.xml">
  <script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>


  
  
</head>
<body><header class="site-header" role="banner">

  <div class="wrapper"><a class="site-title" rel="author" href="/">Linghan Cheung&#39;s Site</a><nav class="site-nav">
        <input type="checkbox" id="nav-trigger" class="nav-trigger" />
        <label for="nav-trigger">
          <span class="menu-icon">
            <svg viewBox="0 0 18 15" width="18px" height="15px">
              <path fill="#424242" d="M18,1.484c0,0.82-0.665,1.484-1.484,1.484H1.484C0.665,2.969,0,2.304,0,1.484l0,0C0,0.665,0.665,0,1.484,0 h15.031C17.335,0,18,0.665,18,1.484L18,1.484z"/>
              <path fill="#424242" d="M18,7.516C18,8.335,17.335,9,16.516,9H1.484C0.665,9,0,8.335,0,7.516l0,0c0-0.82,0.665-1.484,1.484-1.484 h15.031C17.335,6.031,18,6.696,18,7.516L18,7.516z"/>
              <path fill="#424242" d="M18,13.516C18,14.335,17.335,15,16.516,15H1.484C0.665,15,0,14.335,0,13.516l0,0 c0-0.82,0.665-1.484,1.484-1.484h15.031C17.335,12.031,18,12.696,18,13.516L18,13.516z"/>
            </svg>
          </span>
        </label>

        <div class="trigger"><a class="page-link" href="/about/">About</a></div>
      </nav></div>
</header>
<main class="page-content" aria-label="Content">
      <div class="wrapper">
        <article class="post" itemscope itemtype="http://schema.org/BlogPosting">

  <header class="post-header">
    <h1 class="post-title" itemprop="name headline">Caffe 源码阅读之 innerproduct 层解析</h1>
    <p class="post-meta">
      <time datetime="2018-04-16T00:00:00+00:00" itemprop="datePublished">
        
        Apr 16, 2018
      </time>
      </p>
      <style>
        table{
          border-collapse: collapse;
          border-spacing: 0;
          border:2px solid #ff0000;
          margin: auto;
        }
        th{
          border:2px solid #000000;
        }
        td{
          border:1px solid #000000;
        }     
      </style>
  </header>

  <div class="post-content" itemprop="articleBody">
    <p style="text-align: center;">Linghan Cheung</p>
<p style="text-align: center;">lhcheung1991@gmail.com</p>
<p style="text-align: center;">(请在文章底部留下宝贵的评论和建议, <a href="https://github.com/imsun/gitment">Gitment</a>会将留言以issue形式完整保存方便后续查阅)</p>
<h2 id="摘要">摘要</h2>
<hr />
<p><br />
  深度学习在计算机视觉领域大放异彩, 许多在传统方法下无法解决的问题正在被一一攻克. 在整个学术社区共同探索深度学习这块新大陆的潜力时, 高昂的计算成本, 新算法的可复现性, 新方法的易用性等都会制约这个探索过程的顺利进行, 此时, 一个高效, 可扩展, 易用的深度学习框架就显得至关重要. <script type="math/tex">Caffe^{[1]}</script>作为2014开源的深度学习框架, 其使用了 NVIDIA GPU 作为主要计算平台, 获得了非常高效的运行效率, 而且代码组织结构清晰明了, 易于扩展, 因而一经发布就受到了学术社区的欢迎. 虽然现在已经有 <script type="math/tex">PyTorch^{[2]}</script>, <script type="math/tex">TensorFlow^{[3]}</script>等更为现代的深度学习框架, 但 Caffe 还是以各项综合优势在深度学习系统领域有不少的簇拥, 许多 state-of-the-art 的算法如物体检测的 RCNN 系列, SSD 等都使用 Caffe 进行实现, 而其以层为单位进行神经网络的搭建的组织方式, 深远地影响了后续的众多深度学习框架, 对于从事深度学习系统开发的工程师而言, Caffe 也在源源不断地提供着养料. 本文的主要工作如下:</p>

<ol>
  <li>分析 innerproduct 在使用链式求导法则求解过程中的前向计算与后向计算的演算原理;</li>
  <li>分析 Caffe 中所使用的 <script type="math/tex">GEMM(GEneral \ Matrix \ Multiplication)^{[6]}</script> 函数, 以及使用 GEMM 实现的 innerproduct 的计算行为.</li>
</ol>

<p><br /></p>
<h2 id="innerproduct-的运算原理">innerproduct 的运算原理</h2>
<hr />
<p><br />
  innerproduct 对应神经网络中的全连接层, 因为全连结层的运算实质上是若干的输入向量与权值矩阵中的权值向量做內积的过程, 故 Caffe 将其命名为innerproduct. 对于只有一个输入的全连接层, 其运算过程如下所示:</p>

<script type="math/tex; mode=display">\vec{y} = \vec{x}W + \vec{b}\tag{1}</script>

<p>  其中 <script type="math/tex">\vec{x}</script> 是 <script type="math/tex">1 \times K</script> 的行向量, 表示输入; <script type="math/tex">W</script> 是 <script type="math/tex">K \times N</script> 的权值矩阵, 其每一列为一个权值向量, 对应全连接层中一个神经元的参数; <script type="math/tex">\vec{b}</script> 是 <script type="math/tex">1 \times N</script> 的偏置项行向量; <script type="math/tex">\vec{y}</script> 是 <script type="math/tex">1 \times N</script> 的行向量, 表示对应的输出. 由于我们通常使用 <script type="math/tex">Stochastic \ Gradient \ Descent^{[4]}</script> 进行神经网络的训练, 所以网络的输入是一个 batch 的数据, 此时上述只有一个输入的运算过程应该修改为:</p>

<script type="math/tex; mode=display">Y = XW + B\tag{2}</script>

<p>  其中 <script type="math/tex">X</script> 是 <script type="math/tex">M \times K</script> 的矩阵, 每一个行向量为一个输入; <script type="math/tex">W</script> 是 <script type="math/tex">K \times N</script> 的权值矩阵, 其每一列为一个权值向量, 对应全连接层中一个神经元的参数; <script type="math/tex">Y</script> 是 <script type="math/tex">M \times N</script> 的矩阵, 每一个行向量为对应的一个输出. 由于每一个神经元的输出都会加上对应的偏置项, 所以此时 B 为:</p>

<script type="math/tex; mode=display">B_{M,N} = [1, ..., 1]_{1,M}^{T} \times \vec{b}\tag{3}</script>

<p>  假设我们最后的损失函数为:</p>

<script type="math/tex; mode=display">L(X) = Loss(Y, label) = Loss(XW + B, label)\tag{4}</script>

<p>  为了使用 gradient-based 的算法对模型进行优化, 我们必须求得损失函数对参数 <script type="math/tex">W, b</script> 的梯度, 即 <script type="math/tex">\frac{\partial{Loss}}{\partial{W}}</script> 和  <script type="math/tex">\frac{\partial{Loss}}{\partial{b}}</script>, 同时我们还需要知道 <script type="math/tex">\frac{\partial{Loss}}{\partial{X}}</script>, 因为我们使用链式求导法则进行梯度的求解, 此时 <script type="math/tex">X</script> 有可能是浅层网络结构的输出, 在对浅层网络结构的参数进行求导时会需要使用到梯度 <script type="math/tex">\frac{\partial{Loss}}{\partial{X}}</script> . 所以, 在 innerproduct 的运算中, 重点是下述三个偏导数的求解:</p>

<script type="math/tex; mode=display">% <![CDATA[
\begin{eqnarray}
\frac{\partial{Loss}}{\partial{W}} & = & \frac{\partial{Loss}}{\partial{Y}} \cdot \frac{\partial{Y}}{\partial{W}}\tag{5} \\
\frac{\partial{Loss}}{\partial{b}} & = & \frac{\partial{Loss}}{\partial{Y}} \cdot \frac{\partial{Y}}{\partial{b}}\tag{6} \\
\frac{\partial{Loss}}{\partial{X}} & = & \frac{\partial{Loss}}{\partial{Y}} \cdot \frac{\partial{Y}}{\partial{X}}\tag{7}
\end{eqnarray} %]]></script>

<p>  对于 <script type="math/tex">Equation \ (5)</script>, 我们将其进行展开, 则对 <script type="math/tex">\frac{\partial{Y}}{\partial{W}}</script> 有:</p>

<script type="math/tex; mode=display">% <![CDATA[
\begin{eqnarray}
\because Y_{i,j} & = & \sum_{k=1}^{K}{X_{i,k} \cdot W_{k,j}}\tag{8} \\
\therefore \frac{\partial{Y_{i,j}}}{\partial{W_{k,j}}} & = & X_{i,k}, k=1, 2, ..., K\tag{9}
\end{eqnarray} %]]></script>

<p>  而对于 <script type="math/tex">W_{k,j}</script>, 我们有:</p>

<script type="math/tex; mode=display">\frac{\partial{Y_{i,j}}}{\partial{W_{k,j}}} = X_{i,k}, i=1, 2, ..., M\tag{10}</script>

<p>  由 <script type="math/tex">Equation \ (8)(9)(10)</script>, 我们可以看出, <script type="math/tex">W_{k,j}</script> 在 innerproduct 中的运算只与 <script type="math/tex">X_{i,k}</script> 相关, 故在计算 <script type="math/tex">\frac{\partial{Y}}{\partial{W_{k,j}}}</script> 时亦是如此, 因而有:</p>

<script type="math/tex; mode=display">\frac{\partial{Y}}{\partial{W_{k,j}}} = \frac{\partial{Y_{i,j}}}{\partial{W_{k,j}}} = X_{i,k}, i=1, 2, ..., M\tag{11}</script>

<p>  由 <script type="math/tex">Equation \ (5)(11)</script>, 我们可以得到:</p>

<script type="math/tex; mode=display">% <![CDATA[
\begin{eqnarray}
\frac{\partial{Loss}}{\partial{W_{k,j}}} & = & \frac{\partial{Loss}}{\partial{Y}} \cdot \frac{\partial{Y}}{\partial{W_{k,j}}} \\
& = & \sum_{i=m}^{M}{\frac{\partial{Loss}}{\partial{Y_{i,j}}} \cdot \frac{\partial{Y_{i,j}}}{\partial{W_{k,j}}}} \\
& = & \sum_{i=m}^{M}{\frac{\partial{Loss}}{\partial{Y_{i,j}}} \cdot X_{i,k}}\tag{12}
\end{eqnarray} %]]></script>

<p>  由 <script type="math/tex">Equation \ (5)(12)</script>, 我们可以得到:</p>

<script type="math/tex; mode=display">\frac{\partial{Loss}}{\partial{W}} = [[\frac{\partial{Loss}}{\partial{Y}}]^{T} \cdot X]^{T}\tag{13}</script>

<p>  对于 <script type="math/tex">Equation \ (6)</script>, 我们有类似的运算过程, 最后可得:</p>

<script type="math/tex; mode=display">\frac{\partial{Loss}}{\partial{b}} = [[\frac{\partial{Loss}}{\partial{Y}}]^{T} \cdot [1, ..., 1]_{1,M}^{T}]^{T}\tag{14}</script>

<p>  对于 <script type="math/tex">Equation \ (7)</script>, 我们由<script type="math/tex">(8)</script>对其进行展开, 则对 <script type="math/tex">\frac{\partial{Y}}{\partial{X}}</script> 有:</p>

<script type="math/tex; mode=display">% <![CDATA[
\begin{eqnarray}
\frac{\partial{Y_{i,j}}}{\partial{X_{i,k}}} & = & W_{k,j}, k=1, 2, ..., K\tag{15}
\end{eqnarray} %]]></script>

<p>  而对于 <script type="math/tex">X_{i,k}</script>, 我们有:</p>

<script type="math/tex; mode=display">\frac{\partial{Y_{i,j}}}{\partial{X_{i,k}}} = W_{k,j}, j=1, 2, ..., N\tag{16}</script>

<p>  由 <script type="math/tex">Equation \ (15)(16)</script>, 我们可以看出, <script type="math/tex">X_{i,k}</script> 在 innerproduct 中的运算只与 <script type="math/tex">W_{k,j}</script> 相关, 故在计算 <script type="math/tex">\frac{\partial{Y}}{\partial{X_{i,k}}}</script> 时亦是如此, 因而有:</p>

<script type="math/tex; mode=display">\frac{\partial{Y}}{\partial{X_{i,k}}} = \frac{\partial{Y_{i,j}}}{\partial{X_{i,k}}} = W_{k,j}, j=1, 2, ..., N\tag{17}</script>

<p>  由 <script type="math/tex">Equation \ (7)(17)</script>, 我们可以得到:</p>

<script type="math/tex; mode=display">% <![CDATA[
\begin{eqnarray}
\frac{\partial{Loss}}{\partial{X_{i,k}}} & = & \frac{\partial{Loss}}{\partial{Y}} \cdot \frac{\partial{Y}}{\partial{X_{i,k}}} \\
& = & \sum_{j=1}^{N}{\frac{\partial{Loss}}{\partial{Y_{i,j}}} \cdot \frac{\partial{Y_{i,j}}}{\partial{X_{i,k}}}} \\
& = & \sum_{j=1}^{N}{\frac{\partial{Loss}}{\partial{Y_{i,j}}} \cdot W_{k,j}}\tag{18}
\end{eqnarray} %]]></script>

<p>  由 <script type="math/tex">Equation \ (7)(18)</script>, 我们可以得到:</p>

<script type="math/tex; mode=display">\frac{\partial{Loss}}{\partial{X}} = \frac{\partial{Loss}}{\partial{Y}} \cdot W^{T} \tag{19}</script>

<p>  综上, 我们得到 <script type="math/tex">Equation \ (2)</script> 和 <script type="math/tex">Equation \ (13)(14)(19)</script>, 分别代表了 Caffe 中 innerproduct 层在前向计算和后向计算时所遵循的运算原理, 我们进行一下简单的归总.</p>

<script type="math/tex; mode=display">% <![CDATA[
\begin{eqnarray}
Y & = & XW + B\tag{2} \\
\frac{\partial{Loss}}{\partial{W}} & = & [[\frac{\partial{Loss}}{\partial{Y}}]^{T} \cdot X]^{T}\tag{13} \\
\frac{\partial{Loss}}{\partial{b}} & = & [[\frac{\partial{Loss}}{\partial{Y}}]^{T} \cdot [1, ..., 1]_{1,M}^{T}]^{T}\tag{14} \\
\frac{\partial{Loss}}{\partial{X}} & = & \frac{\partial{Loss}}{\partial{Y}} \cdot W^{T} \tag{19}
\end{eqnarray} %]]></script>

<p><br /></p>
<h2 id="caffe-中-innerproduct-的实现">Caffe 中 innerproduct 的实现</h2>
<hr />
<p><br />
  由第一节的讨论我们可以发现, innerproduct 的核心操作就是 GEMM. 事实上, 在现代的神经网络理论和深度学习框架中, 其核心的计算操作本质上都是 GEMM 操作<script type="math/tex">^{[5][6][7]}</script>, 所以, Caffe 中 innerproduct 层的实现是围绕着使用 GEMM 来展开的.</p>

<p>  在 <a href="https://github.com/BVLC/caffe/blob/master/src/caffe/util/math_functions.cpp#L13">caffe/src/caffe/util/math_functions.cpp</a> 中, Caffe 对 GEMM 函数进行了封装, 如下所示:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;&gt;</span>
<span class="kt">void</span> <span class="n">caffe_cpu_gemm</span><span class="o">&lt;</span><span class="kt">float</span><span class="o">&gt;</span><span class="p">(</span><span class="k">const</span> <span class="n">CBLAS_TRANSPOSE</span> <span class="n">TransA</span><span class="p">,</span>
    <span class="k">const</span> <span class="n">CBLAS_TRANSPOSE</span> <span class="n">TransB</span><span class="p">,</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">M</span><span class="p">,</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">N</span><span class="p">,</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">K</span><span class="p">,</span>
    <span class="k">const</span> <span class="kt">float</span> <span class="n">alpha</span><span class="p">,</span> <span class="k">const</span> <span class="kt">float</span><span class="o">*</span> <span class="n">A</span><span class="p">,</span> <span class="k">const</span> <span class="kt">float</span><span class="o">*</span> <span class="n">B</span><span class="p">,</span> <span class="k">const</span> <span class="kt">float</span> <span class="n">beta</span><span class="p">,</span>
    <span class="kt">float</span><span class="o">*</span> <span class="n">C</span><span class="p">)</span> <span class="p">{</span>
  <span class="kt">int</span> <span class="n">lda</span> <span class="o">=</span> <span class="p">(</span><span class="n">TransA</span> <span class="o">==</span> <span class="n">CblasNoTrans</span><span class="p">)</span> <span class="o">?</span> <span class="n">K</span> <span class="o">:</span> <span class="n">M</span><span class="p">;</span>
  <span class="kt">int</span> <span class="n">ldb</span> <span class="o">=</span> <span class="p">(</span><span class="n">TransB</span> <span class="o">==</span> <span class="n">CblasNoTrans</span><span class="p">)</span> <span class="o">?</span> <span class="n">N</span> <span class="o">:</span> <span class="n">K</span><span class="p">;</span>
  <span class="n">cblas_sgemm</span><span class="p">(</span><span class="n">CblasRowMajor</span><span class="p">,</span> <span class="n">TransA</span><span class="p">,</span> <span class="n">TransB</span><span class="p">,</span> <span class="n">M</span><span class="p">,</span> <span class="n">N</span><span class="p">,</span> <span class="n">K</span><span class="p">,</span> <span class="n">alpha</span><span class="p">,</span> <span class="n">A</span><span class="p">,</span> <span class="n">lda</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span>
      <span class="n">ldb</span><span class="p">,</span> <span class="n">beta</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">N</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>此处我们仅对单精度浮点数的运算函数进行解析. 在上述函数 <code class="highlighter-rouge">void caffe_cpu_gemm&lt;float&gt;(...)</code> 中, 调用了 <code class="highlighter-rouge">cblas.h</code> 中的函数 <code class="highlighter-rouge">void cblas_sgemm(...)</code>, 其实现的计算为 <script type="math/tex">C = \alpha \cdot A \cdot B + \beta \cdot C</script>, 即将矩阵 A, B 做矩阵乘法并乘以系数 <script type="math/tex">\alpha</script>, 再加上 <script type="math/tex">\beta</script> 乘以矩阵 C 后得到的矩阵, 其中, 矩阵 A 为 <script type="math/tex">M \times K</script>, 矩阵 B 为 <script type="math/tex">K \times N</script>, 矩阵 C 为 <script type="math/tex">M \times N</script> <script type="math/tex">^{[8]}</script>. 在 <a href="https://github.com/BVLC/caffe/blob/master/src/caffe/util/math_functions.cu#L13">caffe/src/caffe/util/math_functions.cu</a>, Caffe 提供了使用 cuBLAS 的 GPU 版本实现, 其原理与 CPU 版本相仿, 如下所示, 有一点需要注意的是, cuBLAS 采用与 Fortran 相同的列优先存储方式, 所以使用上需要注意行列数的变换:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o">&lt;&gt;</span>
<span class="kt">void</span> <span class="n">caffe_gpu_gemm</span><span class="o">&lt;</span><span class="kt">float</span><span class="o">&gt;</span><span class="p">(</span><span class="k">const</span> <span class="n">CBLAS_TRANSPOSE</span> <span class="n">TransA</span><span class="p">,</span>
    <span class="k">const</span> <span class="n">CBLAS_TRANSPOSE</span> <span class="n">TransB</span><span class="p">,</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">M</span><span class="p">,</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">N</span><span class="p">,</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">K</span><span class="p">,</span>
    <span class="k">const</span> <span class="kt">float</span> <span class="n">alpha</span><span class="p">,</span> <span class="k">const</span> <span class="kt">float</span><span class="o">*</span> <span class="n">A</span><span class="p">,</span> <span class="k">const</span> <span class="kt">float</span><span class="o">*</span> <span class="n">B</span><span class="p">,</span> <span class="k">const</span> <span class="kt">float</span> <span class="n">beta</span><span class="p">,</span>
    <span class="kt">float</span><span class="o">*</span> <span class="n">C</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// Note that cublas follows fortran order.
</span>  <span class="kt">int</span> <span class="n">lda</span> <span class="o">=</span> <span class="p">(</span><span class="n">TransA</span> <span class="o">==</span> <span class="n">CblasNoTrans</span><span class="p">)</span> <span class="o">?</span> <span class="n">K</span> <span class="o">:</span> <span class="n">M</span><span class="p">;</span>
  <span class="kt">int</span> <span class="n">ldb</span> <span class="o">=</span> <span class="p">(</span><span class="n">TransB</span> <span class="o">==</span> <span class="n">CblasNoTrans</span><span class="p">)</span> <span class="o">?</span> <span class="n">N</span> <span class="o">:</span> <span class="n">K</span><span class="p">;</span>
  <span class="n">cublasOperation_t</span> <span class="n">cuTransA</span> <span class="o">=</span>
      <span class="p">(</span><span class="n">TransA</span> <span class="o">==</span> <span class="n">CblasNoTrans</span><span class="p">)</span> <span class="o">?</span> <span class="n">CUBLAS_OP_N</span> <span class="o">:</span> <span class="n">CUBLAS_OP_T</span><span class="p">;</span>
  <span class="n">cublasOperation_t</span> <span class="n">cuTransB</span> <span class="o">=</span>
      <span class="p">(</span><span class="n">TransB</span> <span class="o">==</span> <span class="n">CblasNoTrans</span><span class="p">)</span> <span class="o">?</span> <span class="n">CUBLAS_OP_N</span> <span class="o">:</span> <span class="n">CUBLAS_OP_T</span><span class="p">;</span>
  <span class="n">CUBLAS_CHECK</span><span class="p">(</span><span class="n">cublasSgemm</span><span class="p">(</span><span class="n">Caffe</span><span class="o">::</span><span class="n">cublas_handle</span><span class="p">(),</span> <span class="n">cuTransB</span><span class="p">,</span> <span class="n">cuTransA</span><span class="p">,</span>
      <span class="n">N</span><span class="p">,</span> <span class="n">M</span><span class="p">,</span> <span class="n">K</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">alpha</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">ldb</span><span class="p">,</span> <span class="n">A</span><span class="p">,</span> <span class="n">lda</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">beta</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">N</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>

<p>  在 <a href="https://github.com/BVLC/caffe/blob/master/include/caffe/layers/inner_product_layer.hpp#L42">caffe/include/caffe/layers/inner_product_layer.hpp</a> 中, 为了对 innerproduct 的矩阵运算进行相应的配置, 类 <code class="highlighter-rouge">InnerProductLayer</code> 在继承类 <code class="highlighter-rouge">Layer</code> 后, 增加了如下的数据成员, 其中, <code class="highlighter-rouge">M_, K_, N_</code> 与 <script type="math/tex">Equation \ (2)</script> 中矩阵参数的意义是一致的:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="kt">int</span> <span class="n">M_</span><span class="p">;</span>
  <span class="kt">int</span> <span class="n">K_</span><span class="p">;</span>
  <span class="kt">int</span> <span class="n">N_</span><span class="p">;</span>
  <span class="kt">bool</span> <span class="n">bias_term_</span><span class="p">;</span>
  <span class="n">Blob</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;</span> <span class="n">bias_multiplier_</span><span class="p">;</span>
  <span class="kt">bool</span> <span class="n">transpose_</span><span class="p">;</span>  <span class="c1">///&lt; if true, assume transposed weights
</span></code></pre></div></div>

<p>Caffe 中的层都要继承类 <code class="highlighter-rouge">Layer</code>, 并实现类中函数 <code class="highlighter-rouge">void SetUp(...)</code> 中所调用的用于初始化本层配置的虚函数, 如下所示, 所以, 在类 <code class="highlighter-rouge">InnerProductLayer</code> 的实现中, 初始化工作主要是在函数 <code class="highlighter-rouge">void LayerSetUp(...)</code>, <code class="highlighter-rouge">void Reshape(...)</code> 中完成:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="kt">void</span> <span class="nf">SetUp</span><span class="p">(</span><span class="k">const</span> <span class="n">vector</span><span class="o">&lt;</span><span class="n">Blob</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;*&gt;&amp;</span> <span class="n">bottom</span><span class="p">,</span>
      <span class="k">const</span> <span class="n">vector</span><span class="o">&lt;</span><span class="n">Blob</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;*&gt;&amp;</span> <span class="n">top</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">CheckBlobCounts</span><span class="p">(</span><span class="n">bottom</span><span class="p">,</span> <span class="n">top</span><span class="p">);</span>
    <span class="n">LayerSetUp</span><span class="p">(</span><span class="n">bottom</span><span class="p">,</span> <span class="n">top</span><span class="p">);</span>
    <span class="n">Reshape</span><span class="p">(</span><span class="n">bottom</span><span class="p">,</span> <span class="n">top</span><span class="p">);</span>
    <span class="n">SetLossWeights</span><span class="p">(</span><span class="n">top</span><span class="p">);</span>
  <span class="p">}</span>
</code></pre></div></div>

<p>在 <a href="https://github.com/BVLC/caffe/blob/master/src/caffe/layers/inner_product_layer.cpp#L9">caffe/src/caffe/layers/inner_product_layer.cpp</a> 中, 类 <code class="highlighter-rouge">InnerProductLayer</code> 实现了虚函数 <code class="highlighter-rouge">void LayerSetUp(...)</code>:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o">&lt;</span><span class="k">typename</span> <span class="n">Dtype</span><span class="o">&gt;</span>
<span class="kt">void</span> <span class="n">InnerProductLayer</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;::</span><span class="n">LayerSetUp</span><span class="p">(</span><span class="k">const</span> <span class="n">vector</span><span class="o">&lt;</span><span class="n">Blob</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;*&gt;&amp;</span> <span class="n">bottom</span><span class="p">,</span>
      <span class="k">const</span> <span class="n">vector</span><span class="o">&lt;</span><span class="n">Blob</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;*&gt;&amp;</span> <span class="n">top</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// 使用从 prototxt 中读取到的 LayerParameter 进行初始化, 此时输出矩阵 Y 的列数就是
</span>  <span class="c1">// N_ = num_output 的值
</span>  <span class="k">const</span> <span class="kt">int</span> <span class="n">num_output</span> <span class="o">=</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">layer_param_</span><span class="p">.</span><span class="n">inner_product_param</span><span class="p">().</span><span class="n">num_output</span><span class="p">();</span>
  <span class="p">......;</span>
  <span class="n">N_</span> <span class="o">=</span> <span class="n">num_output</span><span class="p">;</span>
  <span class="c1">// 将输入的 Tensor 从指定的 axis 开始全部拍平, 这样就能得到输入矩阵 X 的行数为 
</span>  <span class="c1">// M_ = batchsize, 输入矩阵 X 的列数为 K_
</span>  <span class="k">const</span> <span class="kt">int</span> <span class="n">axis</span> <span class="o">=</span> <span class="n">bottom</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">CanonicalAxisIndex</span><span class="p">(</span>
      <span class="k">this</span><span class="o">-&gt;</span><span class="n">layer_param_</span><span class="p">.</span><span class="n">inner_product_param</span><span class="p">().</span><span class="n">axis</span><span class="p">());</span>
  <span class="n">K_</span> <span class="o">=</span> <span class="n">bottom</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">count</span><span class="p">(</span><span class="n">axis</span><span class="p">);</span>
  <span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">blobs_</span><span class="p">.</span><span class="n">size</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">LOG</span><span class="p">(</span><span class="n">INFO</span><span class="p">)</span> <span class="o">&lt;&lt;</span> <span class="s">"Skipping parameter initialization"</span><span class="p">;</span>
  <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
    <span class="p">......;</span>
    <span class="c1">// Initialize the weights
</span>    <span class="n">vector</span><span class="o">&lt;</span><span class="kt">int</span><span class="o">&gt;</span> <span class="n">weight_shape</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">transpose_</span><span class="p">)</span> <span class="p">{</span>
      <span class="n">weight_shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">K_</span><span class="p">;</span>
      <span class="n">weight_shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">N_</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
      <span class="c1">// 此时需要注意的是默认情况下, 参数矩阵 W 为 N_ x K_, 而不是 Equation (2)
</span>      <span class="c1">// 中的 K x N, 也就是说, 参数矩阵 W 默认采用的是转置形式, 所以, 在前向计算时需要
</span>      <span class="c1">// 将参数矩阵进行转置, 而后向计算时则不需要转置
</span>      <span class="n">weight_shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">N_</span><span class="p">;</span>
      <span class="n">weight_shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">K_</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">this</span><span class="o">-&gt;</span><span class="n">blobs_</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">reset</span><span class="p">(</span><span class="k">new</span> <span class="n">Blob</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;</span><span class="p">(</span><span class="n">weight_shape</span><span class="p">));</span>
    <span class="p">......;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">bias_term_</span><span class="p">)</span> <span class="p">{</span>
      <span class="n">vector</span><span class="o">&lt;</span><span class="kt">int</span><span class="o">&gt;</span> <span class="n">bias_shape</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">N_</span><span class="p">);</span>
      <span class="k">this</span><span class="o">-&gt;</span><span class="n">blobs_</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">reset</span><span class="p">(</span><span class="k">new</span> <span class="n">Blob</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;</span><span class="p">(</span><span class="n">bias_shape</span><span class="p">));</span>
    <span class="p">}</span>
  <span class="p">}</span>  <span class="c1">// parameter initialization
</span><span class="p">}</span>
</code></pre></div></div>
<p>在 <code class="highlighter-rouge">void LayerSetUp(...)</code> 中, 我们可以得到以下几个要点: 一使用从 prototxt 中读取到的 LayerParameter 进行初始化, 此时输出矩阵 Y 的列数就是 <code class="highlighter-rouge">N_ = num_output</code> 的值; 二将输入的 Tensor 从指定的 axis 开始全部拍平, 这样就能得到输入矩阵 X 的行数为 <code class="highlighter-rouge">M_ = batchsize</code>, 输入矩阵 X 的列数为 <code class="highlighter-rouge">K_</code>, 与 <script type="math/tex">Equation \ (2)</script> 一致; 三此时需要注意的是默认情况下, 参数矩阵 W 为 <code class="highlighter-rouge">N_ x K_</code>, 而不是 <script type="math/tex">Equation \ (2)</script> 中的 <code class="highlighter-rouge">K x N</code>, 也就是说, 参数矩阵 W 默认采用的是转置形式, 所以, 在前向计算时需要将参数矩阵进行转置, 而后向计算时则不需要转置.</p>

<p>有了输入矩阵 X 的数据和参数矩阵 W, b 的数据, 输出矩阵 Y 所需要的内存空间等参数就可以确定了. Caffe 要求每个具体的层都需要提供 <code class="highlighter-rouge">void Reshape(...)</code> 函数, 用来计算本层对应的输出所需要的资源, 在 <a href="https://github.com/BVLC/caffe/blob/master/src/caffe/layers/inner_product_layer.cpp#L57">caffe/src/caffe/layers/inner_product_layer.cpp</a> 中, 类 <code class="highlighter-rouge">InnerProductLayer</code> 实现了虚函数 <code class="highlighter-rouge">void Reshape(...)</code>:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o">&lt;</span><span class="k">typename</span> <span class="n">Dtype</span><span class="o">&gt;</span>
<span class="kt">void</span> <span class="n">InnerProductLayer</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;::</span><span class="n">Reshape</span><span class="p">(</span><span class="k">const</span> <span class="n">vector</span><span class="o">&lt;</span><span class="n">Blob</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;*&gt;&amp;</span> <span class="n">bottom</span><span class="p">,</span>
      <span class="k">const</span> <span class="n">vector</span><span class="o">&lt;</span><span class="n">Blob</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;*&gt;&amp;</span> <span class="n">top</span><span class="p">)</span> <span class="p">{</span>
  <span class="p">......;</span>
  <span class="c1">// M_ = batchsize
</span>  <span class="n">M_</span> <span class="o">=</span> <span class="n">bottom</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">count</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">axis</span><span class="p">);</span>
  <span class="c1">// 输出 reshape 为 M_ x N_
</span>  <span class="n">vector</span><span class="o">&lt;</span><span class="kt">int</span><span class="o">&gt;</span> <span class="n">top_shape</span> <span class="o">=</span> <span class="n">bottom</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">shape</span><span class="p">();</span>
  <span class="n">top_shape</span><span class="p">.</span><span class="n">resize</span><span class="p">(</span><span class="n">axis</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
  <span class="n">top_shape</span><span class="p">[</span><span class="n">axis</span><span class="p">]</span> <span class="o">=</span> <span class="n">N_</span><span class="p">;</span>
  <span class="n">top</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">Reshape</span><span class="p">(</span><span class="n">top_shape</span><span class="p">);</span> <span class="c1">// !!! 在此处申请空间 !!!
</span>  <span class="k">if</span> <span class="p">(</span><span class="n">bias_term_</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">vector</span><span class="o">&lt;</span><span class="kt">int</span><span class="o">&gt;</span> <span class="n">bias_shape</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">M_</span><span class="p">);</span>
    <span class="n">bias_multiplier_</span><span class="p">.</span><span class="n">Reshape</span><span class="p">(</span><span class="n">bias_shape</span><span class="p">);</span>
    <span class="n">caffe_set</span><span class="p">(</span><span class="n">M_</span><span class="p">,</span> <span class="n">Dtype</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span> <span class="n">bias_multiplier_</span><span class="p">.</span><span class="n">mutable_cpu_data</span><span class="p">());</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>在上面的操作中, 输出矩阵 Y 首先被确定为 <code class="highlighter-rouge">M_ x N_</code>, 然后调用对应 Blob <code class="highlighter-rouge">top[0]</code> 的 <code class="highlighter-rouge">Reshape</code> 函数进行内存空间的申请, 关于类 <code class="highlighter-rouge">Blob</code> 的 <code class="highlighter-rouge">Reshape</code> 操作定义在 <a href="https://github.com/BVLC/caffe/blob/master/src/caffe/blob.cpp#L22">caffe/src/caffe/blob.cpp</a>, 此处我们不展开讨论.</p>

<p>  到此, Caffe 已经完成进行 GEMM 操作的所有准备, 接下来可以进行 forward 和 backward 的计算.</p>

<p>  对于</p>

<script type="math/tex; mode=display">Y = XW + B\tag{2}</script>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="k">const</span> <span class="n">Dtype</span><span class="o">*</span> <span class="n">bottom_data</span> <span class="o">=</span> <span class="n">bottom</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">cpu_data</span><span class="p">();</span>
  <span class="n">Dtype</span><span class="o">*</span> <span class="n">top_data</span> <span class="o">=</span> <span class="n">top</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">mutable_cpu_data</span><span class="p">();</span>
  <span class="k">const</span> <span class="n">Dtype</span><span class="o">*</span> <span class="n">weight</span> <span class="o">=</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">blobs_</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">cpu_data</span><span class="p">();</span>
  <span class="n">caffe_cpu_gemm</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;</span><span class="p">(</span><span class="n">CblasNoTrans</span><span class="p">,</span> <span class="n">transpose_</span> <span class="o">?</span> <span class="n">CblasNoTrans</span> <span class="o">:</span> <span class="n">CblasTrans</span><span class="p">,</span>
      <span class="n">M_</span><span class="p">,</span> <span class="n">N_</span><span class="p">,</span> <span class="n">K_</span><span class="p">,</span> <span class="p">(</span><span class="n">Dtype</span><span class="p">)</span><span class="mf">1.</span><span class="p">,</span>
      <span class="n">bottom_data</span><span class="p">,</span> <span class="n">weight</span><span class="p">,</span> <span class="p">(</span><span class="n">Dtype</span><span class="p">)</span><span class="mf">0.</span><span class="p">,</span> <span class="n">top_data</span><span class="p">);</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">bias_term_</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">caffe_cpu_gemm</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;</span><span class="p">(</span><span class="n">CblasNoTrans</span><span class="p">,</span> <span class="n">CblasNoTrans</span><span class="p">,</span> <span class="n">M_</span><span class="p">,</span> <span class="n">N_</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="p">(</span><span class="n">Dtype</span><span class="p">)</span><span class="mf">1.</span><span class="p">,</span>
        <span class="n">bias_multiplier_</span><span class="p">.</span><span class="n">cpu_data</span><span class="p">(),</span>
        <span class="k">this</span><span class="o">-&gt;</span><span class="n">blobs_</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">cpu_data</span><span class="p">(),</span> <span class="p">(</span><span class="n">Dtype</span><span class="p">)</span><span class="mf">1.</span><span class="p">,</span> <span class="n">top_data</span><span class="p">);</span>
  <span class="p">}</span>
</code></pre></div></div>
<p>如上代码所示, 先分别获得 <code class="highlighter-rouge">X, W, b, Y</code> 所对应的数据 <code class="highlighter-rouge">bottom_data</code>, <code class="highlighter-rouge">weight</code>, <code class="highlighter-rouge">this-&gt;blobs_[1]-&gt;cpu_data()</code>, <code class="highlighter-rouge">top_data</code>, 然后执行 <code class="highlighter-rouge">caffe_cpu_gemm()</code>, 需要注意的是, 此时矩阵 W 会进行转置. 而向量 <code class="highlighter-rouge">b</code> 会通过 <script type="math/tex">B_{M,N} = [1, ..., 1]_{1,M}^{T} \times \vec{b}</script> 计算得到矩阵 <code class="highlighter-rouge">B</code>, 再加到结果矩阵 <code class="highlighter-rouge">Y</code> 上.</p>

<p>  对于</p>

<script type="math/tex; mode=display">\frac{\partial{Loss}}{\partial{W}} = [[\frac{\partial{Loss}}{\partial{Y}}]^{T} \cdot X]^{T}\tag{13}</script>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">param_propagate_down_</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="p">{</span>
    <span class="k">const</span> <span class="n">Dtype</span><span class="o">*</span> <span class="n">top_diff</span> <span class="o">=</span> <span class="n">top</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">cpu_diff</span><span class="p">();</span>
    <span class="k">const</span> <span class="n">Dtype</span><span class="o">*</span> <span class="n">bottom_data</span> <span class="o">=</span> <span class="n">bottom</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">cpu_data</span><span class="p">();</span>
    <span class="c1">// Gradient with respect to weight
</span>    <span class="k">if</span> <span class="p">(</span><span class="n">transpose_</span><span class="p">)</span> <span class="p">{</span>
      <span class="n">caffe_cpu_gemm</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;</span><span class="p">(</span><span class="n">CblasTrans</span><span class="p">,</span> <span class="n">CblasNoTrans</span><span class="p">,</span>
          <span class="n">K_</span><span class="p">,</span> <span class="n">N_</span><span class="p">,</span> <span class="n">M_</span><span class="p">,</span>
          <span class="p">(</span><span class="n">Dtype</span><span class="p">)</span><span class="mf">1.</span><span class="p">,</span> <span class="n">bottom_data</span><span class="p">,</span> <span class="n">top_diff</span><span class="p">,</span>
          <span class="p">(</span><span class="n">Dtype</span><span class="p">)</span><span class="mf">1.</span><span class="p">,</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">blobs_</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">mutable_cpu_diff</span><span class="p">());</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
      <span class="n">caffe_cpu_gemm</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;</span><span class="p">(</span><span class="n">CblasTrans</span><span class="p">,</span> <span class="n">CblasNoTrans</span><span class="p">,</span>
          <span class="n">N_</span><span class="p">,</span> <span class="n">K_</span><span class="p">,</span> <span class="n">M_</span><span class="p">,</span>
          <span class="p">(</span><span class="n">Dtype</span><span class="p">)</span><span class="mf">1.</span><span class="p">,</span> <span class="n">top_diff</span><span class="p">,</span> <span class="n">bottom_data</span><span class="p">,</span>
          <span class="p">(</span><span class="n">Dtype</span><span class="p">)</span><span class="mf">1.</span><span class="p">,</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">blobs_</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">mutable_cpu_diff</span><span class="p">());</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>如上代码所示, <code class="highlighter-rouge">top_diff</code> 是由上层网络所计算的 <script type="math/tex">\frac{\partial{Loss}}{\partial{Y}}</script>, <code class="highlighter-rouge">bottom_data</code> 是本层的输入矩阵 X, 默认情况下, 程序会执行 <code class="highlighter-rouge">else</code> 分支, 需要注意的是, <code class="highlighter-rouge">top_diff</code> 进行了转置, 而 <code class="highlighter-rouge">bottom_data</code> 不需要转置, 这与我们的演算结果是一致的, 而二者的乘积就不需要跟演算一样再进行转置了, 因为参数矩阵 W 的梯度矩阵 <code class="highlighter-rouge">this-&gt;blobs_[0]-&gt;mutable_cpu_diff())</code> 和 W 一样为 <script type="math/tex">N\_ \times K\_</script>.</p>

<p>  对于</p>

<script type="math/tex; mode=display">\frac{\partial{Loss}}{\partial{b}} = [[\frac{\partial{Loss}}{\partial{Y}}]^{T} \cdot [1, ..., 1]_{1,M}^{T}]^{T}\tag{14}</script>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="n">bias_term_</span> <span class="o">&amp;&amp;</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">param_propagate_down_</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="p">{</span>
    <span class="k">const</span> <span class="n">Dtype</span><span class="o">*</span> <span class="n">top_diff</span> <span class="o">=</span> <span class="n">top</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">cpu_diff</span><span class="p">();</span>
    <span class="c1">// Gradient with respect to bias
</span>    <span class="n">caffe_cpu_gemv</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;</span><span class="p">(</span><span class="n">CblasTrans</span><span class="p">,</span> <span class="n">M_</span><span class="p">,</span> <span class="n">N_</span><span class="p">,</span> <span class="p">(</span><span class="n">Dtype</span><span class="p">)</span><span class="mf">1.</span><span class="p">,</span> <span class="n">top_diff</span><span class="p">,</span>
        <span class="n">bias_multiplier_</span><span class="p">.</span><span class="n">cpu_data</span><span class="p">(),</span> <span class="p">(</span><span class="n">Dtype</span><span class="p">)</span><span class="mf">1.</span><span class="p">,</span>
        <span class="k">this</span><span class="o">-&gt;</span><span class="n">blobs_</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">mutable_cpu_diff</span><span class="p">());</span>
<span class="p">}</span>
</code></pre></div></div>

<p>  对于</p>

<script type="math/tex; mode=display">\frac{\partial{Loss}}{\partial{X}} = \frac{\partial{Loss}}{\partial{Y}} \cdot W^{T} \tag{19}</script>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="n">propagate_down</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="p">{</span>
    <span class="k">const</span> <span class="n">Dtype</span><span class="o">*</span> <span class="n">top_diff</span> <span class="o">=</span> <span class="n">top</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">cpu_diff</span><span class="p">();</span>
    <span class="c1">// Gradient with respect to bottom data
</span>    <span class="k">if</span> <span class="p">(</span><span class="n">transpose_</span><span class="p">)</span> <span class="p">{</span>
      <span class="n">caffe_cpu_gemm</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;</span><span class="p">(</span><span class="n">CblasNoTrans</span><span class="p">,</span> <span class="n">CblasTrans</span><span class="p">,</span>
          <span class="n">M_</span><span class="p">,</span> <span class="n">K_</span><span class="p">,</span> <span class="n">N_</span><span class="p">,</span>
          <span class="p">(</span><span class="n">Dtype</span><span class="p">)</span><span class="mf">1.</span><span class="p">,</span> <span class="n">top_diff</span><span class="p">,</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">blobs_</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">cpu_data</span><span class="p">(),</span>
          <span class="p">(</span><span class="n">Dtype</span><span class="p">)</span><span class="mf">0.</span><span class="p">,</span> <span class="n">bottom</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">mutable_cpu_diff</span><span class="p">());</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
      <span class="n">caffe_cpu_gemm</span><span class="o">&lt;</span><span class="n">Dtype</span><span class="o">&gt;</span><span class="p">(</span><span class="n">CblasNoTrans</span><span class="p">,</span> <span class="n">CblasNoTrans</span><span class="p">,</span>
          <span class="n">M_</span><span class="p">,</span> <span class="n">K_</span><span class="p">,</span> <span class="n">N_</span><span class="p">,</span>
          <span class="p">(</span><span class="n">Dtype</span><span class="p">)</span><span class="mf">1.</span><span class="p">,</span> <span class="n">top_diff</span><span class="p">,</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">blobs_</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">cpu_data</span><span class="p">(),</span>
          <span class="p">(</span><span class="n">Dtype</span><span class="p">)</span><span class="mf">0.</span><span class="p">,</span> <span class="n">bottom</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">mutable_cpu_diff</span><span class="p">());</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>如上代码所示, <code class="highlighter-rouge">top_diff</code> 是由上层网络所计算的 <script type="math/tex">\frac{\partial{Loss}}{\partial{Y}}</script>, 默认情况下, 程序会执行 <code class="highlighter-rouge">else</code> 分支, 需要注意的是, 此时不需要像演算过程那样对参数矩阵 W 进行转置, 因为其本身已经是 <script type="math/tex">N\_ \times K\_</script>.</p>

<p>  对应的使用 cuBLAS 的 GPU 版本实现, 其原理与 CPU 版本相仿, 此处不再赘述, 请参考 <a href="https://github.com/BVLC/caffe/blob/master/src/caffe/layers/inner_product_layer.cu">caffe/src/caffe/layers/inner_product_layer.cu</a>.</p>

<p><br /></p>
<h2 id="总结">总结</h2>
<hr />
<p><br />
  本文给出了 Caffe 中 innerproduct 层的运算原理解析, 并对具体的代码实现进行了对应的讲解, 希望能对使用 Caffe 的工程师有参考价值.</p>

<p><br /></p>
<h2 id="引用">引用</h2>
<hr />
<p><br />
[1] Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., &amp; Girshick, R., et al. (2014). Caffe: Convolutional Architecture for Fast Feature Embedding. Acm International Conference on Multimedia (pp.675-678). ACM. <a href="http://caffe.berkeleyvision.org">http://caffe.berkeleyvision.org</a></p>

<p>[2] pytorch@github(2018), Tensors and Dynamic neural networks in Python with strong GPU acceleration. <a href="https://github.com/pytorch/pytorch">https://github.com/pytorch/pytorch</a></p>

<p>[3] tensorflow@github(2018), Computation using data flow graphs for scalable machine learning. <a href="https://github.com/tensorflow/tensorflowg">https://github.com/tensorflow/tensorflow</a></p>

<p>[4] wikipedia(2018), Stochastic gradient descent. <a href="https://en.wikipedia.org/wiki/Stochastic_gradient_descent">https://en.wikipedia.org/wiki/Stochastic_gradient_descent</a></p>

<p>[5] petewarden.com(2015), Why GEMM is at the heart of deep learning. <a href="https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/">https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/</a></p>

<p>[6] wikipedia(2018), Basic Linear Algebra Subprograms. <a href="https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3">https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3</a></p>

<p>[7] lhcheung1991@github(2017), 卷积神经网络在 ARM-CPU 上的推断计算综述. <a href="https://lhcheung1991.github.io/blogs/2017/08/29/deeplearning-inference-benchmark-survey.html">https://lhcheung1991.github.io/blogs/2017/08/29/deeplearning-inference-benchmark-survey.html</a></p>

<p>[8] Intel Developer Zone(2018), cblas_?gemm Computes a matrix-matrix product with general matrices. <a href="https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm">https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm</a></p>

  </div>

  <!--  -->
  <div id="container"></div>
<link rel="stylesheet" href="https://imsun.github.io/gitment/style/default.css">
<script src="https://imsun.github.io/gitment/dist/gitment.browser.js"></script>
<script>
var gitment = new Gitment({
  id: '2018-04-16 00:00:00 +0000', // 可选。默认为 location.href
  owner: 'lhCheung1991',
  repo: 'lhCheung1991.github.io',
  oauth: {
    client_id: 'cb41081866d5077f3ab8',
    client_secret: '765409e9d236eadec2af8515a58edc72a8b2c3d3',
  },
})
gitment.render('container')
</script>

</article>

      </div>
    </main><footer class="site-footer h-card">
  <data class="u-url" href="/"></data>

  <div class="wrapper">

    <h2 class="footer-heading">Linghan Cheung&#39;s Site</h2>

    <div class="footer-col-wrapper">
      <div class="footer-col footer-col-1">
        <ul class="contact-list">
          <li class="p-name">Linghan Cheung&#39;s Site</li><li><a class="u-email" href="mailto:lhcheung1991@gmail.com">lhcheung1991@gmail.com</a></li></ul>
      </div>

      <div class="footer-col footer-col-2"><ul class="social-media-list"><li><a href="https://github.com/lhCheung1991"><svg class="svg-icon"><use xlink:href="/assets/minima-social-icons.svg#github"></use></svg> <span class="username">lhCheung1991</span></a></li></ul>
</div>

      <div class="footer-col footer-col-3">
        <p>攻无不克战无不胜的毛泽东思想万岁！
</p>
      </div>
    </div>

  </div>

</footer>
</body>

</html>
